Optimize PipelineConfiguration-checking ClusterStateListeners #117038

joegallo · 2024-11-19T16:02:54Z

Before (this PR and the various semi-associated PRs that have already been merged):

After (this PR and the various semi-associated PRs that have already been merged):

I ran into this in the /_nodes/hot_threads output of a real in-the-wild cluster and it's been on my list ever since.

Two of our ClusterStateListeners are operating on PipelineConfiguration: the GeoIpDownloaderTaskExecutor and the IndexTemplateRegistry. Both of them ask the PipelineConfiguration for details that it doesn't actually have on hand, so we need to parse the XContent of the pipeline in order to retrieve those details. As a consequence, we're parsing the XContent of some of these pipelines twice (once per listener) on every cluster state update. In the case of GeoIpDownloaderTaskExecutor we have to do that for every pipeline that there is, so we're adding a cost to the master that scales on the order of the number of pipelines, which means that for clusters with large numbers of pipelines the actual CPU cost expended can start to matter (computers are very fast, of course, but things like this can add up). This doesn't change the big-O of that part of the work (since we're still asking questions of every pipeline) but it does change the code so that we can answer those questions with objects that are hanging around in memory rather than needing to parse yaml-or-json to get them.

In terms of implementation, PipelineConfiguration holds the core of the changes. Rather than keeping the unparsed XContent around, it now maintains an unmodifiable parsed version of the same data. Other parts of ingest (most notably, creating a Pipeline from a PipelineConfiguration) rely on having a mutable copy of the PipelineConfiguration's config, so there's a getter that accepts a boolean which can be used to ask for that.

Related to #116988, #115355, #115348, and #115347 -- those were small 'jab' optimization PRs laying the groundwork for this final one. Similarly see also #115425, #115423, and #115421 -- the cleanups in those PRs were mostly intended to make this PR more readable and simpler in the end.

Closes #97382

elasticsearchmachine · 2024-11-19T16:07:13Z

Pinging @elastic/es-data-management (Team:Data Management)

original-brownbear

LGTM from my end :) seems this should serialize a little more compact than JSON as well and removes the weird tracking of the content type :)

original-brownbear · 2024-11-19T16:44:21Z

server/src/main/java/org/elasticsearch/ingest/PipelineConfiguration.java

+        if (in.getTransportVersion().onOrAfter(TransportVersions.INGEST_PIPELINE_CONFIGURATION_AS_MAP)) {
+            config = in.readGenericMap();
+        } else {
+            final BytesReference bytes = in.readBytesReference();


NIT: Not too important since it's BwC and these things aren't gigantic but in the spirit of doing this consistently, you could use readSlicedBytesReference actually here.

Nice, okay, 8467326.

in the spirit of doing this consistently, you could use readSlicedBytesReference actually here.

@original-brownbear can you explain more about this? readSlicedBytesReference() just calls readBytesReference(), so is it just for naming consistency?

That delegation is only the default behavior. For the real world network buffers we have here we have overriding implementations that just slice the underlying buffer without copying.

Gotcha, thanks for the explanation!

original-brownbear · 2024-11-19T16:51:43Z

server/src/main/java/org/elasticsearch/ingest/PipelineConfiguration.java

+                copy.add(innerDeepCopy(itemValue, unmodifiable));
+            }
+            return unmodifiable ? Collections.unmodifiableList(copy) : copy;
+        } else if (value == null || value instanceof String || value instanceof Number || value instanceof Boolean) {


Could just put this condition in the assertion since it only has an effect with assertions on anyway, otherwise it's the same as the else branch?

I was on the fence about that one, I'm happy to take your comment as a tiebreaker in the opposite direction, 659ad01.

masseyke

LGTM

elasticsearchmachine · 2024-11-19T21:43:15Z

💔 Backport failed

Status	Branch	Result
❌	8.x	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 117038

…c#117038)

… (#117098)

nielsbauman · 2024-11-20T15:55:26Z

This is related to #97382. As @joegallo mentioned, this doesn't change the big-O of the cluster state appliers, but it probably helps a lot - seeing as the hot threads in that issue all include the (now improved) getConfigAsMap.

@joegallo what do you think, should we close that issue since this change probably helped a lot, or do we keep it open because the big-O of those appliers hasn't changed? I'm inclined to go with the former. If anyone runs into this again, we can always open the issue again.

joegallo · 2024-11-20T16:31:19Z

Yeahhh, that has to do with the optimization of the cluster state appliers rather than the cluster state listeners, but indeed my changes would very much have improved them, too, or so I'd think. I'm going to spend half an hour microbenchmarking that to see if the difference is measurable. Assuming it is, I'll update the description of this PR and close that issue. Very nice catch, @nielsbauman!

joegallo · 2024-11-20T17:52:28Z

Adding 2000 ingest pipelines on a three node 8.15.4 cluster took 31.6 minutes, doing the same on 9.0.0 with this PR merged took 18.05 minutes, so that's a noticeable improvement.

Here's the CPU utilization for both of those clusters while the pipelines were being added, though, which really drives home how much less work the 9.0.0 version was doing (and note that you can see my 30 minutes versus 18 minutes claim in the charts, too):

joegallo · 2024-11-20T17:57:43Z

Based on the above, I do think it's fair to close #97382 -- indeed it doesn't change the big-O runtime there (we're still checking all the pipelines), but it does mean that we're not parsing the x-content of the pipelines during that loop, so it's probably faster-enough to be considered 'solved'.

nielsbauman · 2024-11-20T18:00:28Z

That's a speed-up if I've ever seen one! Definitely worth a bi-weekly highlight at the very least.

…c#117038)

We don't need to pass around the REST request body verbatim when processing a put-pipeline request, parsing it repeatedly as it's needed. Instead we can parse the body once when starting to process the request on the master, and pass the parsed `Map` around instead. Relates elastic#117038

joegallo requested a review from a team as a code owner November 19, 2024 16:02

elasticsearchmachine added needs:triage Requires assignment of a team area label v9.0.0 labels Nov 19, 2024

joegallo added 11 commits November 19, 2024 11:04

Whitespace

7c0f9e3

Rewrite some test assertions

7f1bdeb

Cache a parsed deep copy and hand it out like candy

7ff5879

Add a transport version for changing this structure

8679f11

Hide this in an overloaded method

e6d827b

Use a better method name

d577238

Updates for the logstashbridge

3da58e1

Add some javadocs

612d219

Rewrite the class's javadoc

4030b56

Add a big test of the new code

a259354

Switch to an assert for the bottom case

b41f212

joegallo force-pushed the optimize-pipeline-configuration-checks branch from ab574b1 to b41f212 Compare November 19, 2024 16:06

joegallo requested review from masseyke and original-brownbear November 19, 2024 16:06

Remove a stray semicolon

c6f97d5

original-brownbear approved these changes Nov 19, 2024

View reviewed changes

joegallo added 2 commits November 19, 2024 12:19

Use readSlicedBytesReference here

8467326

Collapse the condition into the assertion

659ad01

masseyke approved these changes Nov 19, 2024

View reviewed changes

joegallo merged commit 123b103 into elastic:main Nov 19, 2024
17 checks passed

joegallo deleted the optimize-pipeline-configuration-checks branch November 19, 2024 21:42

elasticsearchmachine added the backport pending label Nov 19, 2024

joegallo mentioned this pull request Nov 19, 2024

[8.x] Optimize PipelineConfiguration-checking ClusterStateListeners (#117038) #117098

Merged

joegallo added a commit to joegallo/elasticsearch that referenced this pull request Nov 19, 2024

Optimize PipelineConfiguration-checking ClusterStateListeners (elasti…

c8ad04c

…c#117038)

joegallo added a commit to joegallo/elasticsearch that referenced this pull request Nov 20, 2024

Optimize PipelineConfiguration-checking ClusterStateListeners (elasti…

1e05e21

…c#117038)

elasticsearchmachine pushed a commit that referenced this pull request Nov 20, 2024

Optimize PipelineConfiguration-checking ClusterStateListeners (#117038)…

bb33952

… (#117098)

joegallo removed the backport pending label Nov 20, 2024

joegallo mentioned this pull request Nov 20, 2024

Improve Ingest Pipelines comparison when applying cluster state updates #97382

Closed

rjernst pushed a commit to rjernst/elasticsearch that referenced this pull request Nov 20, 2024

Optimize PipelineConfiguration-checking ClusterStateListeners (elasti…

6d9e184

…c#117038)

alexey-ivanov-es pushed a commit to alexey-ivanov-es/elasticsearch that referenced this pull request Nov 28, 2024

Optimize PipelineConfiguration-checking ClusterStateListeners (elasti…

0dfe75c

…c#117038)

DaveCTurner mentioned this pull request Nov 29, 2024

Parse PutPipelineRequest#source earlier #117775

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize PipelineConfiguration-checking ClusterStateListeners #117038

Optimize PipelineConfiguration-checking ClusterStateListeners #117038

joegallo commented Nov 19, 2024 •

edited

Loading

elasticsearchmachine commented Nov 19, 2024

original-brownbear left a comment

original-brownbear Nov 19, 2024

joegallo Nov 19, 2024

dakrone Nov 19, 2024

original-brownbear Nov 19, 2024

dakrone Nov 19, 2024

original-brownbear Nov 19, 2024

joegallo Nov 19, 2024

masseyke left a comment

elasticsearchmachine commented Nov 19, 2024

nielsbauman commented Nov 20, 2024

joegallo commented Nov 20, 2024 •

edited

Loading

joegallo commented Nov 20, 2024

joegallo commented Nov 20, 2024

nielsbauman commented Nov 20, 2024

Optimize PipelineConfiguration-checking ClusterStateListeners #117038

Optimize PipelineConfiguration-checking ClusterStateListeners #117038

Conversation

joegallo commented Nov 19, 2024 • edited Loading

elasticsearchmachine commented Nov 19, 2024

original-brownbear left a comment

Choose a reason for hiding this comment

original-brownbear Nov 19, 2024

Choose a reason for hiding this comment

joegallo Nov 19, 2024

Choose a reason for hiding this comment

dakrone Nov 19, 2024

Choose a reason for hiding this comment

original-brownbear Nov 19, 2024

Choose a reason for hiding this comment

dakrone Nov 19, 2024

Choose a reason for hiding this comment

original-brownbear Nov 19, 2024

Choose a reason for hiding this comment

joegallo Nov 19, 2024

Choose a reason for hiding this comment

masseyke left a comment

Choose a reason for hiding this comment

elasticsearchmachine commented Nov 19, 2024

💔 Backport failed

nielsbauman commented Nov 20, 2024

joegallo commented Nov 20, 2024 • edited Loading

joegallo commented Nov 20, 2024

joegallo commented Nov 20, 2024

nielsbauman commented Nov 20, 2024

joegallo commented Nov 19, 2024 •

edited

Loading

joegallo commented Nov 20, 2024 •

edited

Loading